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Abstract — In this paper we propose z'-Anoinica, a novel 
anomaly detection technique that can be trained on huge 
data sets with much reduced running time compared to the 
benchmark one-class Support Vector Machines algorithm. 
In /y-Anomica, the idea is to train the machine such that 
it can provide a close approximation to the exact decision 
plane using fewer training points and without losing 
much of the generalization performance of the classical 
approach. We have tested the proposed algorithm on a 
variety of continuous data sets under different conditions. 
We show that under all test conditions the developed 
procedure closely preserves the accuracy of standard one- 
class Support Vector Machines while reducing both the 
training time and the test time by 5 — 20 times. 

Keywords - Anomaly Detection; Support Vector Ma- 
chines; Kernel; Optimization; 

I. Introduction 

Outlier or anomaly detection refers to the task of 
identifying abnormal or inconsistent patterns from a 
dataset. While they may seem to be undesirable entities, 
identifying them has many potential applications in 
fraud and intrusion detection, financial market analy- 
sis, medical research and safety-critical vehicle health 
management. Broadly speaking, outliers can be detected 
using either supervised or semi-supervised or unsuper- 
vised techniques [13] [5]. Unsupervised techniques, as 
the name suggests, do not require labeled instances for 
detecting outliers. In this category, the most popular 
ones are the distance-based and density based tech- 
niques. The basic idea of these techniques is that outliers 
are points in low density regions or those which are 
far from other points. In their seminal work, Knorr 
et al. [15] proposed a distance-based outlier detection 
technique based on the idea of nearest neighbors. The 
naive solution has a quadratic time complexity since 
every data point needs to be compared to every other to 
find the nearest neighbors. To overcome this, researchers 
have proposed several techniques such as the work 
by Angiulli and Pizzuti [1], Ramaswamy et al. [17], 
and Bay and Schwabacher [2], Density-based outlier 
detection schemes, on the other hand, flag a point as 


an outlier if the point is in a low density region. The 
density of a point can be evaluated using several tech- 
niques such as the ones proposed in [12]. Supervised 
techniques require labeled instances of both normal and 
abnormal operation data for first building a model ( e.g . a 
classifier) and then testing if an unknown data point is a 
normal one or an outlier. The model can be probabilistic 
such as Bayesian inference [9] or deterministic such as 
decision trees, Support Vector Machines (SVMs) and 
neural networks [14]. Semi-supervised techniques only 
require labeled instances of normal data. Hence they are 
more widely applicable than the fully supervised ones. 
These techniques build models of normal data and then 
flag as outliers all those points which do not fit the 
model. 

Since this paper proposes a variant of unsupervised 
anomaly detection technique using support vector ma- 
chines, we discuss more about this here. Support vector 
machines [21] [7] have been widely used for clas- 
sification and regression. While the original idea of 
using SVM has been around for many years, recent 
interest has been kindled by the need for analyzing 
large datasets. Fehr et al. [10] presents a scheme for 
efficient learning of SVMs based on the intuition that 
most of the training time for non-linear SVMs is wasted 
in evaluating the kernel matrix. In their approach, they 
approximate a single SVM using a collection of simpler 
linear SVMs. Each of these simpler ones can be trained 
and tested in constant time, leading to low running time 
without any loss of accuracy. Such a construction can 
be viewed as a tree in which any intermediate node 
represents a hyper-plane and the leaf nodes correspond 
to pure labels of one class type. 

Burges and Scholkopf [4] present a different tech- 
nique for speeding up SVMs. Let 
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be the normal to the decision surface where a 3 ’s denote 


the Lagrange multipliers corresponding to the support 
vectors s,-, y 3 denotes the true class labels, <&(•) denotes 
the kernel function, and N s denotes the number of 
support vectors. This computation scales linearly with 
the number of support vectors. To achieve speedup, the 
authors propose to approximate the normal using fewer 
support vectors (TV,) as, 
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The goal is then to minimize the L2-norm of the two 
normal vectors 


P = 


T — f 


As has been shown in [4], there exists nontrivial values 
of T which ensures p ^ 0. 

The work most closely related to this one is the 
reduced support vector machine (RSVM) idea presented 
in [16] and [6], In these, an initial SVM is trained not 
on the entire training set, but rather on a subset of the 
training set called the active training set. Then, the SVM 
is evaluated on a validation set. If the accuracy is accept- 
able, the algorithm converges, else a set of misclassified 
points are selected from the remaining training set and 
added to the active training set. The approach in [6] first 
sorts the misclassified points according to their scores 
on the validation set and then divides the points into 
equal size subsets. When additional points are needed, 
it selects new points from each subset. In our approach 
we do not sort the points and thereby achieve lower 
running time. 

The proposed v-Anomica algorithm is faster than the 
standard benchmark one class SVMs while preserving 
the accuracy. It achieves this by developing the hy- 
perplane in an incremental fashion. We show that, in 
many cases, v-Anomica has similar prediction accuracy 
compared to classical one class SVM while reducing 
the running time dramatically. Our main contributions 
in this paper are: 

• We propose a variant of one class SVM-based 
novelty detection algorithm called v-Anomica with 
improved running time while retaining the accu- 
racy of standard one-class SVMs. 

• We demonstrate the capability of the algorithm in 
handling huge sizes of training data (both instances 
and attributes). 

• We measure the performance of the proposed tech- 
nique using different metrics, such as accuracy, 
sensitivity, and run time. 

» We provide some useful insights regarding the 
effectiveness of proposed technique based on the 


experimental evaluation. 

II. Novelty Detection with One Class SVMs 

One class SVMs, an unsupervised learning method 
for estimating the density of the target support objects 
was introduced by Scholkopf [18]. Throughout this 
paper, we have considered positive labeled data points 
as normal and negative label data points as outliers. 
The model consists of a parameter v that denotes the 
maximum allowance of outliers in the training data. The 
idea is to draw a separating hyperplane that can separate 
these outliers from the rest of the training examples, as 
shown in Fig. 1. Unlike the 2— class SVMs classifier, 
in one class SVMs model, the separating hyperplane is 
constructed using positive labeled training data set only. 
Since a TV — 1 dimensional hyperplane can exist in the 
TV-dimensional feature space, the primary task is to find 
the optimal separating plane that maximizes the margin 
between the hyperplane and the origin, which is the lone 
representative of the second class with negative label. 

A. The Model 



Figure 1. This figure illustrates the geometric interpretation of 
optimal hyperplane for one class SVMs. 


We assume a set of labeled training data T> = 
imU * n t ^ le i n P ut space R, where x t £ R rf . We 
further assume that there exists a function <j> that can 
be used to map variables from the input space to the 
feature space T, i.e. (j> : R d — > T . In feature space 
the inner product (x,, x,-) property, where x,; := <f>{xi) 
holds. Also Cover’s theorem [21] states that nonsep- 
arable or nonlinearly separable features in the input 
space R is more likely to be linearly separable in the 
feature space T, provided the transformation </>(.) is 
nonlinear and the dimensionality of the feature space is 
high enough. While evaluating the dot product in the 
feature space, the explicit calculation using (f> can be 
avoided by simply evaluating the kernel function i.e. 
k ( Xi,Xj ) := (4> (xi) , (j) (xj)). However in order for this 
to hold, this the chosen inner-product kernel must satisfy 
Mercer’s theorem [3]. For the majority of this paper, we 
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have used Radial Basis Function (RBF) kernel (Eqn. 1) 
that evaluates the distances between data points as, 

II 2 

k(xi,Xj) = e ^ (1) 

where ||.|| denotes the Euclidean norm and a defines 
the kernel width. 

Scholkopf [18] showed that in the high dimensional 
feature space it is possible to construct an optimal hy- 
perplane by maximizing the margin between the origin 
and the hyperplane in the feature space by solving the 
following primal optimization problem, 


minimize P (w, p, £ f ) = \ww T + & - p 

2 vt 

2=1 

subject to (w .<j>{xi)) >p-£i, & > 0, v £ [0, 1] 

(2) 

where v is an user specified parameter that defines the 
upper bound on the training error, and also the lower 
bound on the fraction of training points that are support 
vectors, £ is the non-zero slack variable, p is the offset, 
4>{xj) represents the transformed image of in the 
Euclidean space and i £ [/'] . Throughout this study, we 
will use the scaled version [8] of the dual problem which 
takes the form of. 


minimize 
subject to 


Q = - ^2 a i a jk ( x i, x j) + P 
i,j 

0 <<*»<!, z/G [0,1] 



where ctj and /l, are Lagrangian multipliers. The optimal 
solution must satisfy the exact Karush-Kuhn-Tucker 
(KKT) conditions which can be summarized as, 

oti = 1 g(xi) < p & > 0 

0 < at < 1 g{xi) = p ii = 0 (4) 

a-i = 0 g(xi) > p & = 0 

where g(xj) = Yhi a ik{xi,Xj). The value of the 
p can be recovered from the constraint of the primal 
problem by exploiting the solution w and pattern x t 
corresponding to 0 < cti < 1 while setting = 0 under 
equality condition. There exist at least vi training points 
with non-zero Lagrangian multipliers (a) and these 
points {xi : i G [£] , oti > 0} are called support vectors. 
Let = {i : a,; = 0}, I m = {i : 0 < m < 1} and 
T rlrn = {i : oti = 1} be the set of indices of Lagrangian 
multipliers corresponding to non-SVs, marginal and 
non-marginal support vectors respectively. Once a is 
known, SVMs compute the following decision function. 


f{xj) = sign(^2 aik(xi,Xj) + ^ k(xi,Xj) - p) 

(5) 

If the decision function predicts a negative label for 
a given test point Xj, this implies that the test point is 
classified as outlier. Test examples with positive labels 
are considered as normal. 

B. Virtual Decision Surface 

The decision boundary is defined by a normal vec- 
tor w (also referred as weight vector) is orthogonal 
to the plane and an offset p. All points x lying on 
this hyperplane must satisfy g(x) — p = 0 where 
{< 7 (x) = w.x, Vw G T}. Since the weight vector is 
a weighted sum of the features corresponding to the 
support vectors, one may be motivated to define two 
normal vectors ui and A both perpendicular to the 
decision plane such that, 


LO A 

M = W 


(6) 


where x n is the unit normal along u> and A. It is not 
too difficult to prove that. 


9 u( z ) = |M| = wo 
9\(z) ll A ll A 0 


where ujq and Ao are the offset terms corresponding 
to normal vectors u j and A. This is because the distance 
of the hyperplane from the origin remains unchanged 
i.e. = pyy. An important conclusion is that for 
a fixed test point z, the ratio of the decision values 
evaluated using two different normal vectors (defined 
by two different sets of points) orthogonal to the same 
hyperplane is constant. This can further be expressed 
as. 


U(z) 

fp{z) 


otjk(xi, z) p a 
jgz 

E ie z m ,z„ m Pik( x ii z) ~ P/3 
j&z 


( 8 ) 


where g is a constant, f a (z) and fp(z) are the 
decision functions (Eqn. 5) expressed in terms of Sup- 
port Vectors corresponding to Lagrange’s multiplier a* 
and Pi. The fact that members of T m U T nm and 
Z m ljT rlrn may differ in number leads to the fact that the 
construction of the weight vector does not depend on 
the number of support vectors. It is well known that the 
positive semidefiniteness of the dual problem may result 
in redundant support vectors which defines the normal 
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vector. This means that some of the support vectors are a 
linear combination of other support vectors and implies 
that the removal of some of these linearly dependent 
support vectors will not change the hyperplane. In 
previous work [4], Burges and Scholkopf pointed out 
that the solution of the SVMs may not be the sparsest 
one and suggested ways of approximating the solution 
using virtual Support Vectors. For one-class SVMs, the 
existence of the parameter v may be the source that 
introduces redundancies in the solution because it leads 
to a minimum required number of support vectors. In 
this research we are motivated to develop a scheme that 
searches for a reduced set of the transformed features in 
T which is sufficiently close to approximate the normal 
vector of the exact solution of one-class SVMs and thus 
retaining the same accuracy with lower running time. 

III. Proposed Approach: //-Anomica 

//-Anomica proposes an approximate solution that 
permits one-class SVMs to train on huge data sets in 
much reduced time. The main idea of this algorithm is 
to start with an initial “feasible solution” of classical 
one-class model trained on a very reduced data set and 
guide the current solution towards the “target solution”. 
Here the solution of the optimal hyperplane from the 
exact solution is set as the target. To achieve this 
goal, a controlled updating of the existing training pool 
with new examples in an iterative fashion has been 
adopted. In order to select the appropriate subset of new 
examples, we propose a two stage strategy. In the first 
step, we ensure that at each iteration the solution of 
the most updated model is along the direction of the 
optimal solution. Secondly, at each step the number of 
new members which control the step length is decided 
based on some model feedback. The work presented 
here exploits the fact that the v parameter of one-class 
SVMs plays a very important role in defining the highest 
allowable fraction of misclassihcation of the training 
data. This means the one-class model, once built, should 
be able to correctly classify 1 — v fraction of the entire 
training set as normal examples. For the rest of the paper 
we will refer to this as the “//-criterion”. Any newly 
developed model (based on a subset of the entire data 
set) which is a close approximation to the exact solution 
is bound to meet the “//-criterion”. Such a data set can 
be considered as a representative working set. 

In the following, we will demonstrate the core idea 
of the proposed algorithm in steps. The i/-Anomica 
algorithm (Algorithm 1 ) starts with the assumption that 
two non overlapping data sets have been randomly 
chosen from the same distribution. One of these two 
sets was assigned for training purpose while the second 


set was kept for validation purpose. The model also 
assumes that the optimal value of the kernel parameter 
a (Eqn. 1) has already been evaluated for a fixed v. 
Under this condition, if a standard one-class model is 
successfully built on the entire training set, the model 
should satisfy the “//-criterion”. 


Algorithm 1 Anomica 
Argument: 

Let the training set be X = {x\ y X 2 --..x p }, X E 7 Z d . Let X± be a 
chosen randomly chosen subset of X i.e. X\ = { x±,X 2 , .. x C X, 
where £ « p and X 2 = X\X±. Let Z = {z±, Z 2 ----z r }, Z E 7Z d be 
the validation set such that X fl Z = 0, where 0 corresponds to null 
vector. 

Notations: I represents indices. 

Input: Xi, X 2 , Z, a and v E {0, 1}. 

Output: Lagrangian multipliers (a:*), Support Vectors (SV s* ) and 
Bias (p* ) 

Initialization: Variable I^ eg = 0 and I^ os = 0; 

Step A: Compute a.* and p* by minimizing 

i ^ aiajk ( Xj,Xj ) + p I iv - Y, otj I 
ijexj \ rexx / 

subject to 0 < a.i < 1, v E [0, 1], i EX 1 

Step B: Obtain classification rate C * on Z. 

M = { Zm ■■ m £ [r] , f(zln) < 0 } 

1 N 
71=1 

Step C: Check objective E r « C* — (1 — v). 

Step D: [a * , SV s * , p* ] =U pdate Member/ E r , X\ , X 2 , Z). 


In the proposed technique, we start by randomly 
selecting a small subset from the entire training set 
and using this small subset to develop the initial One- 
Class SVMs model. Once the SVs are obtained, we 
validate the resulting model on the validation set. Since 
the current model is based on a very small subset of 
the entire training set, the classification accuracy of the 
model may not satisfy the “//-criterion” on the hold 
out set. This is based on the fact that a correct model 
should be able to achieve the same level of classification 
accuracy (in this case 1 — v because of “//-criterion”) 
on a hold out set which has been generated from a 
similar distribution to that of the training set. Here it 
is important to note that the proposed algorithm uses 
the “//-criterion” as the target classification rate. 

If the classification rate on the validation set is greater 
than (1 — //), it means that either the small subset of 
the training set has fewer positive examples or that 
the data points corresponding to the support vectors of 
this model are not good representative of the positive 
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(a) 



Figure 2. This figure shows the update rules of z/-Anomica. 
Subfigures (a) and (b) represent the over classified and under classified 
cases respectively. In subfigure (c) the evaluated classification rate 
of the current model meets the ‘V-criterion”. The target hyperplane 
and the current hyperplane is represented by dotted and dashed line 
respectively. 


examples. This is analogous to saying that the most 
recently evaluated support vectors have defined a normal 
vector (w) corresponding to a hyperplane (Fig. 2-b) that 
predicts too many positive members in the hold out set 
and thus does not satisfy the //-criterion. Similarly, if 
classification rate is less than (1 — //), it implies that 
the current working set has too few negative examples 
(Fig. 2-a). Hence there is a necessity to update the 
initial working set with additional positive or negative 
examples only when any of the above two situations 
arises. Pseudo code of our algorithm for doing this is 
shown in Algorithm 2. This procedure is repeated until 
the //-criterion is satisfied or close to being satisfied on 
the hold out set. The number of examples (positive or 
negative) to be selected from the entire remaining set 
is governed by a penalized weight function as shown 
in line 5 of the pseudo code (Algorithm 2), based on 
deviation of the classification rate on the validation set 
from the target (1 — //). Once the //-criterion on the hold 
out set is satisfied (Fig. 2-c), the algorithm meets the 
stopping criterion, and hence terminates. 

A. How does the v-criterion influence the model? 

We will further illustrate the role of “//-criterion” by 
using a synthetic “one class” data set. The data set 
consists of samples drawn from a //-dimension Gaussian 
distribution with user specified mean ( /./) and covariance 
(S). For simplicity we will use a 2— dimensional data 
set drawn from a single distribution. We have chosen 
a linear kernel in the SVMs model to do the mapping. 


Algorithm 2 UpdateMember(i/ r ,Zi, V2, Z) 

1: Let the operator o can take either > or < but one at a time. 

2: while E r A 0 do 
3: while E r o 0 do 

4: (S*.O0) 

5: Set k\=\eagVn(E d reat ) and penalized weight = 

"" 

6: Randomly select I . , indices from the possible 

J index r x 

indices 

7: Update indices I* ndex «- (CdeJ 

8: Xl^\XlUX 2 (I* nd )} 

9: X 2 «- {X2\X 2 (I* ndex )} 

10: Compute a* and p* by minimizing 


i E otiajk (xi,Xj) + p 
i,j€X i 




subject to 0 < oti < 1, v £ [0, 1] , i EX \ 


1 1 : Evaluate decision function, 


Sx 2 = sign{ E aA(z;, Xj) - p) 
ie 

12: Obtain classification rate C * on Z. 

M = { Zm ■■ m £ [r] , f(zm) < 0} 

i N 

C r = i-^E nM) 

71=1 

13: Check objective E r zr C z r — ( 1 — //). 

14: if E r ps 0 then 

15: Return 

16: end if 

17: end while 

18: end while 


With a fixed number of instances, the redundancies in 
the data set were controlled by varying the covariance 
of the distribution. In the first run, two data sets each 
of 1001 instances were generated from a distribution 
with same mean (0.001) but with two very different 
covariances. For one set the covariance was set to 
“machine precision” (eps) which is the minimum al- 
lowable spacing between two floating point numbers 
and 10 20 xeps for the other set. The outcome of the One 
class SVMs model (with v — 0.1) on these two data sets 
has been summarized in Table I. It can be observed that 
even though the redundancies are varying widely from 
one set to the other, the total number of support vectors 
still remains the same because of the //-criterion. Hence 
there is a possibility that the v parameter may introduce 
redundancies in the solution. 

The algorithm v — Anomica described in the earlier 
sections is an extension of the classical One-Class 
SVMs. It has been shown that for both these methods 
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Table I 

Here we compare two cases to check the redundancy of 

CLASSICAL ONE CLASS SVMS USING SYNTHETIC DATA SET. 


Training 

Covariance 

SVs (Exact) 

size 


Non-margin 

Margin 

1001 

eps 

100 

1 

1001 

10 20 x eps 

100 

1 


the fundamental optimization procedure is exactly the 
same. In the following we will present interesting study 
on how these two techniques may produce a different 
outcome and try to provide some insight on what makes 
them different. 

We also included a separate experiment where both 
v — Anomica and classical one class SVMs were de- 
veloped on the same data set and the corresponding 
SVs were noted. Each support vector obtained by the 
classical approach was evaluated using the same hyper- 
plane constructed by the exact solution itself and the 
hyperplane constructed by the approximate solution. In 
Fig. 3, scores for the support vectors from both solutions 
have been compared. The plots represent the absolute 
values of the original scores, sorted in descending order. 
With normalization, these scores almost lie on the top 
of each other. This is because the decision values for 
both these method will be proportional (Eqn. 8). 



Figure 3. This figure represents the normalized scores from classical 
one class SVMs and v— Anomica. 

IV. Experimental Results 

In this study, we have chosen two systems health 
management related data sets and one real-world as- 
tronomical data set as benchmark applications. These 
data sets represent diverse training set sizes, and input 
dimensionality and therefore builds a good platform to 
test the accuracy and scalability of these algorithms. 
Table II summarizes the characteristics of the data sets 
used for the experiments. Both one class SVMs and 


v — Anomica algorithms have been tested on a Dual core 
Pentium4 computer running Windows XP with 4 GByte 
of memory. The current version of our algorithms is 
based on the OSU SVM Classifier Toolbox (ver. 3.00) 
1 and is written using Matlab. The OSU SVM Toolbox 
is an adaptation from the LIBSVM and uses Sequential 
Minimal Optimization (SMO) for solving the quadratic 
problem (Eqn. 3). To test these algorithms, nonlinear 
RBF kernel was used and the optimal setting of the 
kernel parameter was determined using the method 
described in [20]. In addition to that, it should be noted 
that for all analysis using ^—Anomica the size of the 
initial subset is chosen to be 15% of the entire training 
set. However this parameter can vary depending on the 
problem size. 

We first experiment with the emulated OPAD [19] 
(Optical Plume Anomaly Detection) data which is a 
set of time varying spectra profiles measured by an 
optical plume analysis in liquid propulsion engines. A 
second set of experiments were conducted on Sloan 
Digital Sky Survey (SDSS) photometry data (SDSS 
DR6 2 ) for testing the large scale training capabilities 
of our algorithms. The Commercial Modular Aero- 
Propulsion System Simulation (CMAPSS) data set has 
been used for the final set of analysis. The CMAPSS is 
a high fidelity system level engine simulation software 
for simulating user-specified transient engine behavior 
under normal and faulty conditions over flights. Detailed 
background on the CMAPSS framework can be found 
in [11]. The above data sets were split into non- 
overlapping training, validation and test sets as shown 
in table II. 

Baseline results were obtained by running one-class 
SVMs model and compared with those obtained from 
v - Anomica on the above data sets. Three sets of results 
were reported for analyzing the correct classification 
accuracy, sensitivity and time complexity of these algo- 
rithms. For CMAPSS data set, we will only summarize 
the outcomes of the analysis due to space limitations. 

A. Run Time Analysis of the v— Anomica 

Figures 4(a) shows the resulting training times for 
exact solution and //-Anomica with five different sizes 
of training set on OPAD data. The exact solution uses 
the entire training set in all cases. //-Anomica starts 
with an initial model built on a small subset of the 
entire training data set and updates the training set as 
it progresses towards the target (1 — v) classification 
rate on the validation set. In Fig. 4(a), we show the 

1 http://svm.sourceforge.net/download.shtml 

2 http://www.sdss.org/dr6/ 


6 


Table II 

Details on the Data Sets used to Test the v-Anomica Algorithms 


Data sets 

Source 

Variable 

Type 

Number of 
Variables 

1 

Training 

Altai Instance 
Validation 

s 

Testing 

OPAD 

Emulator 

Continuous 

1024 

5xl0 3 

5xl0 3 

2xl0 3 

CMAPSS 

Simulator 

Continuous 

29 

500xl0 3 

20xl0 3 

lOOxlO 3 

SDSS 

Real life data 

Continuous 

12 

275 xlO 3 

10x10 s 

130xl0 3 


80 

60 


O Exact solution 
• Anomica 


O 


</) 

^40 

£ 

H 

20 


o 


f 


1000 2000 , 4000 

Number of training points 


1 


5000 


(a) This graph shows the mean training time complexity 
with symmetric error bars of 2 xcr long over 50 runs. 


25 

20 


O Exact 
• Anomica 


O 

nSVs: 501 


£15 

D 

B 

H 10 


nSVs:400 


nSVs: 200 

O 


nSVs: 101 

5 nSVs: 51° ,, - 

n nSVs: 33 • 

nS\¥ lO^SVs: 18 • nSVs:62 

°0 1000 2000 4000 

Number of training points 


nSVs: 79 
5000 


(b) This graph shows the mean test time complexity with 
symmetric error bars of 2 xcr long over 50 runs. In addition, 
for both classifiers, the number of support vector for each 
case has been indicated by the variable nSVs. 


Figure 4. Training (a) and test (b) times of the one-class SVMs 
model and i^-Anomica with different sizes of the training sets using 
OPAD data. 


mean training time over 50 runs for varying training 
sizes and their corresponding error bars. It is clear that 
with fewer training points the difference in training 
time for exact solution and ^-Anomica is low. As the 
size of the training data set increases, the computing 
time increases drastically for exact solution, however 
zy-Anomica shows much better performance. Table III 
presents the performance of these algorithms on the 
SDSS. It can be observed that the proposed technique 
outperforms one-class SVM model for all the test 
cases and the performance gain factor increases with 


increasing training set size. In Fig. 4(b), we present 
the time required to evaluate the OPAD test sets. As 
the number of SVs increase the resultant test time 
proportionally increases and this particular trend can be 
seen in the plot. Since v - Anomica requires fewer SVs 
while building a model, the test time is lower compared 
to the classical approach. On SDSS data set, with 275k 
training and 130k test instances, v - Anomica is on an 
average approximately 15 times faster than the classical 
method. With increasing training instances such as with 
CMAPSS data, zz-Anomica consistently performs on 
average 18 times faster with 500k training and 100k 
test instances. 

B. Classification Accuracy and Prediction Performance 
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5S 81- 
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O Exact solution • Anomica 


77- 


1000 2000 4000 

Number of training points 


5000 


Figure 5. Figure comparing the classification rate of the test set using 
classical one class SVMs and i'-Anomica algorithm with different 
sizes of the training sets using OPAD data. 


It could be of real interest to find out if the com- 
putational advantage of zz-Anomica trades off with the 
detector’s ability to match the classification accuracy of 
the exact solution of one class SVMs. Figure 5 shows a 
comparison of the detection rates of both algorithms and 
these results were obtained on the same test set while the 
sizes of the training sets were varied. It can be seen that 
zz-Anomica overall provides similar accuracies when 
compared to one-class SVM but computed with much 
reduced training times. As the training size increases, 
the models get more accurate and as a result the clas- 
sification rate of both the model gets more closer and 
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consistent. This is because introducing more training 
examples brings in additional useful information that 
aid correct detection and classification. 



Indices representing the ranking of detected outliers 


Figure 6. Figure showing the normalized scores of the outliers 
detected in a test set from OPAD data using one class SVMs and 
zz-Anomica, arranged in a descending order. 

Now we present an analysis on predicting the “out- 
lierness” of new unseen patterns. Figure 6 indicates 
that tz-Anomica ranked the points in terms of their 
“outlierness” comparably to classical one-class SVMs. 
This can be observed from the plot where both one- 
class SVMs and zz- Anomic a have been used to predict 
a set of outliers in an unlabeled data set and their corre- 
sponding outlier scores were compared. These outliers 
were sorted based on the absolute values of their scores 
and thereafter normalized. Finally, to investigate the 
accuracy in separating the sequence of outliers from 
normal patterns, ROC analysis on the predictions of 
zz-Anomica was accomplished and the area under the 
ROC (AUC) was computed for each run. Here we have 
assumed that the sequence of outliers detected by one- 
class SVMs are the ground truth. Results obtained show 
that v- Anomica consistently performed well in detecting 
the presence of these outliers and for each case the AUC 
was very close to 1. 

V. Conclusion 

In this paper, we presented a new method for faster 
anomaly detection using a modified one-class SVMs. 
Compared to classical one-class SVM all our experi- 
ments showed a competitive speedup (up to factor 15-18 
on these data sets). The proposed method reduces the 
number of the operations needed to compute a reduced 
and near optimal training set. The model developed on 
this working set is a close approximation of the exact 
solution and can be represented with much less number 
of SVs. Hence both training time and test time is 
significantly reduced. However zz-Anomica can achieve 
very close classification accuracies (losing less than 1% 


in most cases) compared to one-class SVMs. The paper 
demonstrates the preliminary success of the proposed 
method on a wide variety of data sets. Also from all 
the experimental observations we find that the model 
converges in finite number of iterations which ensures 
that the cardinality of the final training set is always 
less than the cardinality of the entire training set. We 
note that the current version of the paper doesn’t have 
a theoretical upper bound on the number of support 
vectors but we intend to consider this in our future 
research. 
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